# Image-to-Text Conversion
Qwen Qwen2.5 VL 7B Instruct GGUF
Apache-2.0
A quantized version of Qwen2.5-VL-7B-Instruct, using llama.cpp for quantization, supporting multimodal tasks such as image-to-text conversion.
Text-to-Image English
Q
bartowski
2,056
2
Gemma 3 12B It Qat GGUF
Gemma 3 12B IT is a large language model developed by Google, supporting multimodal input and long-context processing.
Image-to-Text
G
lmstudio-community
36.65k
4
Gemma 3 Glitter 4B
Optimized model based on Gemma 3 4B, using the same data mixing scheme as Glitter 12b
Large Language Model
G
allura-org
140
3
Google.gemma 3 27b Pt GGUF
Gemma 3 27B is a large-scale pre-trained language model developed by Google, with 27 billion parameters, suitable for various natural language processing tasks.
Large Language Model
G
DevQuasar
477
1
Huihui Ai.granite Vision 3.2 2b Abliterated GGUF
Granite Vision 3.2 2B Abliterated is a vision-language model focused on image-to-text conversion tasks.
Image-to-Text
H
DevQuasar
724
1
Llava Maid 7B DPO GGUF
LLaVA is a large language and vision assistant model capable of handling multimodal tasks involving images and text.
Image-to-Text
L
megaaziib
99
4
Donut Base Finetuned SOGC Archive Trademarks 1883 2001
A multilingual image-to-text model for identifying and parsing historical trademark documents, supporting German, Italian, and French.
Image-to-Text
Transformers Supports Multiple Languages

D
Travad98
24
0
Git Base Textcaps
MIT
GIT is a Transformer-based generative image-to-text model capable of converting visual content into descriptive text.
Image-to-Text
Transformers Supports Multiple Languages

G
microsoft
482
8
Featured Recommended AI Models